从观察数据中恢复基本的定向无环形结构(DAG),由于DAG受限的优化问题的组合性质,因此极具挑战性。最近,通过将DAG约束将DAG的限制定义为平滑的平等性,通常基于邻接矩阵上的多项式,将DAG学习作为连续优化问题。现有方法将非常小的系数放在高阶多项式术语上以进行稳定,因为它们认为由于数字爆炸而导致高阶项上的大系数有害。相反,我们发现,高阶术语上的大系数对DAG学习有益,当邻接矩阵的光谱辐射小时,高阶术语的较大系数可以比小尺寸近似于小的限制。同行。基于此,我们提出了一种具有有效截短的矩阵功率迭代的新型DAG学习方法,以近似于基于几何序列的DAG约束。从经验上讲,我们的DAG学习方法在各种环境中的表现优于先前的最新方法,在结构锤距离上通常以3倍或以上的倍数。
translated by 谷歌翻译
非线性独立组件分析(ICA)旨在从可观察到的非线性混合物中回收基本的独立潜在来源。如何使非线性ICA模型可识别到某些微不足道的不确定性是无监督学习的长期问题。鉴于某些辅助变量(例如,类标签和/或域/时间索引)作为弱监督或归纳偏见,最近的突破将源标准独立性作为条件独立性重新制定为条件独立性。但是,具有无条件先验的非线性ICA不能从此类发展中受益。我们探索替代路径,并仅考虑在混合过程中的假设,例如结构稀疏性或独立影响。我们表明,在此类约束的特定实例下,可以从其非线性混合物到置换和零件转换的独立潜在来源,从而实现非线性ICA无辅助变量的非平地可识别性。我们提供估计方法并通过实验验证理论结果。图像数据的结果表明,我们的条件可能存在于许多实际数据生成过程中。
translated by 谷歌翻译
State-of-the-art causal discovery methods usually assume that the observational data is complete. However, the missing data problem is pervasive in many practical scenarios such as clinical trials, economics, and biology. One straightforward way to address the missing data problem is first to impute the data using off-the-shelf imputation methods and then apply existing causal discovery methods. However, such a two-step method may suffer from suboptimality, as the imputation algorithm may introduce bias for modeling the underlying data distribution. In this paper, we develop a general method, which we call MissDAG, to perform causal discovery from data with incomplete observations. Focusing mainly on the assumptions of ignorable missingness and the identifiable additive noise models (ANMs), MissDAG maximizes the expected likelihood of the visible part of observations under the expectation-maximization (EM) framework. In the E-step, in cases where computing the posterior distributions of parameters in closed-form is not feasible, Monte Carlo EM is leveraged to approximate the likelihood. In the M-step, MissDAG leverages the density transformation to model the noise distributions with simpler and specific formulations by virtue of the ANMs and uses a likelihood-based causal discovery algorithm with directed acyclic graph constraint. We demonstrate the flexibility of MissDAG for incorporating various causal discovery algorithms and its efficacy through extensive simulations and real data experiments.
translated by 谷歌翻译
$ \ texttt {gcastle} $是一个端到端Python工具箱,用于因果结构学习。它提供了从模拟器或现实世界数据集的生成数据,从数据学习因果结构的功能,以及评估学到的图表,以及有用的实践,例如先验知识插入,初步邻域选择和后处理以删除错误发现。与相关包相比,$ \ texttt {gcastle} $包括许多最近开发的基于渐变的因果发现方法,具有可选的GPU加速。$ \ texttt {gcastle} $为可以直接尝试代码以及具有图形用户干扰的从业者来为研究人员提供方便。当前版本也提供了电信中的三个现实世界数据集。$ \ texttt {gcastle} $可在Apache许可证2.0下获得\ url {https://github.com/huawei-noah/trustworthyai/tree/master/gcastle}。
translated by 谷歌翻译
本文研究了从观察数据学习因果关系的问题。我们用二进制图邻接矩阵参数化的形式重整结构方程模型(SEM),并显示,如果原始SEM是可识别的,则可以识别二进制邻接矩阵到真实因果图的超图在温和的条件下。然后,我们利用所述重新设计的SEM来开发一种因果结构学习方法,可以通过利用对非循环性和Gumbel-Softmax方法的平滑表征来实现基于梯度的优化来有效地接受训练,以近似于二进制邻接矩阵。发现获得的条目通常在零或一个附近,并且可以容易地阈值以识别边缘。我们对合成和实时数据集进行实验,以验证所提出的方法的有效性,并表明它容易包括不同的平滑模型功能,并在考虑大多数数据集中实现了大大提高的性能。
translated by 谷歌翻译
This paper studies the quantization of heavy-tailed data in some fundamental statistical estimation problems, where the underlying distributions have bounded moments of some order. We propose to truncate and properly dither the data prior to a uniform quantization. Our major standpoint is that (near) minimax rates of estimation error are achievable merely from the quantized data produced by the proposed scheme. In particular, concrete results are worked out for covariance estimation, compressed sensing, and matrix completion, all agreeing that the quantization only slightly worsens the multiplicative factor. Besides, we study compressed sensing where both covariate (i.e., sensing vector) and response are quantized. Under covariate quantization, although our recovery program is non-convex because the covariance matrix estimator lacks positive semi-definiteness, all local minimizers are proved to enjoy near optimal error bound. Moreover, by the concentration inequality of product process and covering argument, we establish near minimax uniform recovery guarantee for quantized compressed sensing with heavy-tailed noise.
translated by 谷歌翻译
We present Self Meta Pseudo Labels, a novel semi-supervised learning method similar to Meta Pseudo Labels but without the teacher model. We introduce a novel way to use a single model for both generating pseudo labels and classification, allowing us to store only one model in memory instead of two. Our method attains similar performance to the Meta Pseudo Labels method while drastically reducing memory usage.
translated by 谷歌翻译
A challenge in spoken language translation is that plenty of spoken content is long-form, but short units are necessary for obtaining high-quality translations. To address this mismatch, we fine-tune a general-purpose, large language model to split long ASR transcripts into segments that can be independently translated so as to maximize the overall translation quality. We compare to several segmentation strategies and find that our approach improves BLEU score on three languages by an average of 2.7 BLEU overall compared to an automatic punctuation baseline. Further, we demonstrate the effectiveness of two constrained decoding strategies to improve well-formedness of the model output from above 99% to 100%.
translated by 谷歌翻译
Recently, there has been increasing interest in synthesizing data to improve downstream text-to-SQL tasks. In this paper, we first examined the existing synthesized datasets and discovered that state-of-the-art text-to-SQL algorithms did not further improve on popular benchmarks when trained with augmented synthetic data. We observed two shortcomings: illogical synthetic SQL queries from independent column sampling and arbitrary table joins. To address these issues, we propose a novel synthesis framework that incorporates key relationships from schema, imposes strong typing, and conducts schema-distance-weighted column sampling. We also adopt an intermediate representation (IR) for the SQL-to-text task to further improve the quality of the generated natural language questions. When existing powerful semantic parsers are pre-finetuned on our high-quality synthesized data, our experiments show that these models have significant accuracy boosts on popular benchmarks, including new state-of-the-art performance on Spider.
translated by 谷歌翻译
There has been great progress in unifying various table-to-text tasks using a single encoder-decoder model trained via multi-task learning (Xie et al., 2022). However, existing methods typically encode task information with a simple dataset name as a prefix to the encoder. This not only limits the effectiveness of multi-task learning, but also hinders the model's ability to generalize to new domains or tasks that were not seen during training, which is crucial for real-world applications. In this paper, we propose compositional task configurations, a set of prompts prepended to the encoder to improve cross-task generalization of unified models. We design the task configurations to explicitly specify the task type, as well as its input and output types. We show that this not only allows the model to better learn shared knowledge across different tasks at training, but also allows us to control the model by composing new configurations that apply novel input-output combinations in a zero-shot manner. We demonstrate via experiments over ten table-to-text tasks that our method outperforms the UnifiedSKG baseline by noticeable margins in both in-domain and zero-shot settings, with average improvements of +0.5 and +12.6 from using a T5-large backbone, respectively.
translated by 谷歌翻译